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REMARKS 



Applicants thank Examiner Epperson for speaking with Diane M. Tsuda, by phone on 
September 28, 2004. Claims 1-13, 15-20, and 22-26 were pending prior to this response. By the 
present communication, claims 2, 13, 15, 18, and 26 have been cancelled without prejudice, no 
claims have been added, and claims 1, 10, 16, and 19 have been amended to define Applicants' 
invention with greater particularity. The claim amendments add no new matter, being fully 
supported by the Specification and original claims. Accordingly, claims 1, 3-12, 16, 17, 19, 20, 
and 22-25 are currently pending. 

The Rejection Under 35 U.S.C. S 112, First Paraeraph 

Applicants respectfully traverse the rejection of claims 1-13, 15-20, and 22-26 under 35 
U.S.C. § 1 12, first paragraph, for allegedly lacking description commensurate with the scope of 
the claims, as applied to the currently pending claims. Applicants disagree with the Examiner's 
application of Univ. of Rochester v G.D. Searle & Co., Inc., 358 USPQ2d 1886 (Fed. Cir. 2004) 
to the present claims. In the Univ. of Rochester case, the Applicants were claiming a compound 
that interacted with a particular chemical entity, namely PGHS-2. Presumably, a class of 
compounds having certain structural characteristics in common would be required. By contrast, 
in the present invention claim 1 describes "a method for identifying a polynucleotide encoding a 
enzyme of interest", but does not require that the polynucleotide encoding the enzyme of interest 
have any other property than its ability to hybridize to a probe polynucleotide that has been 
preselected by the practioner as containing a probe-length portion of a DNA sequence that 
encodes "an enzyme of interest". The user is free to select "the enzyme of interest," but the target 
in this case is unknown and cannot be described chemically other than by its hybridization with 
the probe molecule. Thus there is a fundamental difference between the claims of the Univ. of 
Rochester case and the claims at issue here. 

No description of the atoms making up either the probe molecule or the enzyme of 
interest needs to be provided, or can be provided, and Applicants respectfully submit that those 
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of skill in the art would understand that the chemical interaction known in the art as 
"hybridization" is a specific description of a chemical phenomenon. 

Therefore, it appears that the Examiner is interpreting the "written description 
requirement" as requiring that the claim be narrowed in an inappropriate manner. The invention 
methods, as defined by amended claim 1, do not require one skilled in the art to arrive at any 
particular chemical entity that could be described in terms of the atomic makeup or chemical 
structure. All that is required, is hybridization of a polynucleotide to a complementary segment 
in a probe bearing a detectable label. Moreover, the "common attribute" that functions in the 
invention claims is hybridization of complementary DNA sequences, which (under such 
conditions and for such time as to allow hybridization of complementary sequences, as required in 
claim 7) is a universal chemical phenomenon and not unpredictable, despite the Examiner's 
assertion to the contrary. 

What is "common" and predictable in all DNA hybridization reactions under suitable 
conditions is that A binds to T and C binds to G, just the same as oxygen binds to hydrogen 
under suitable conditions. Thus, complementary sequences under the right conditions inevitably 
bind to one another and the chemical laws governing such a phenomenon do not differ according 
to whether the DNA sequence from which the probe is constructed encodes one or a different 
type of enzyme. If an analogy to chemical interactions between specific atoms or combinations 
is required, Applicants submit that A, T, C and G are specific, commonly known chemical 
constructs and the binding affinity of complementary strands of DNA is so well known in the art 
that an Applicant is not required to meet the "heightened" written description requirement of 
Section 1 12, first paragraph. 

Thus, Applicants submit that the Examiner's demand for application of the "heightened" 
written description requirement appropriate to an unpredictable art area (Office Action, page 6) 
is inapposite as applied to the present claims. In view of the universal applicability and 
predictability of the chemistry involved, and the total absence of any claim to discovery of a 
particular chemical construct or particular type of enzymatic activity, applicants respectfully 
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submit that the description of the invention clearly allows persons of ordinary skill in the art to 
recognize the that [the inventors] invented what is claimed as required by In re Gosteli (872 F.2d 
1008,1012, 10USPQ2d 1614 (Fed. Cir. 1989). 

To further prosecution of this application, Applicants have amended claim 1 to recite "A 
method for identifying an enzyme of interest, comprising: (a) obtaining a plurality of 
polynucleotides derived from a mixed population of organisms or more than one organism; (b) 
normalizing the representation of organisms present in the plurality of polynucleotides to 
increase representation of rare species; (c) contacting a library containing clones of normalized 
polynucleotides from (b) with at least one oligonucleotide probe labeled with a detectable 
molecule, wherein the probe comprises at least a portion of a polynucleotide sequence encoding an 
enzyme of interest; (d) incubating the clones under such conditions and for such time as to allow 
hybridization of complementary sequences; (e) separating clones with an analyzer that detects the 
detectable molecule; (f) contacting the separated clones with a reporter system that comprises a 
substrate for the enzyme of interest; and (g) identifying clones capable of modulating expression 
or activity of the reporter system thereby identifying a polynucleotide that encodes the enzyme of 
interest." The amended language has been narrowed to recite an enzyme, a more clearly 
described probe, and conditions for identifying clones. Support for the amended claim language 
may be found in the Specification at page 34, lines 1-5; page 17, lines 14-18 and lines 21-29. 

Therefore, Applicants respectfully submit that claim 1 and those dependent thereon, as 
presently amended, meet the requirements of "written description" under 35 U.S.C. § 1 12, first 
paragraph. Accordingly, reconsideration and withdrawal of the rejection are respectfully 
requested. 

The Rejection Under 35 U.S.C. § 112, Second Paragraph 

Applicants respectfully traverse the rejection of claim 22 under 35 U.S.C. § 1 12, second 
paragraph, as allegedly being indefinite. 
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With regard to claim 22, the Examiner alleges that the phrase "small molecule" is a 
relative term, thus introducing lack of clarity into the claim. However, Applicants submit that 
the phrase "small molecule" as used in Applicants' specification and claims does not refer 
specifically to the size of the molecule. Despite the Examiner's assertion that the broadest 
dictionary definition of the term "small" should prevail, Applicants submit that such an 
interpretation is "unreasonable" and thus in contravention of the rules covering definiteness, 
which hold that a dictionary definition does not necessarily apply when a term is used as a term 
of art. Applicants submit that those of skill in the art would understand "small" in the phrase 
"small molecule" as belonging to a term of art that distinguishes, for example, between 
enzymatic chemical compounds, and enzymatic polypeptides. 

Not only is the phrase "small molecule" used as a term of art in the Specification and in 
claim 22, it would be understood by those of skill in the art to be correctly used. Thus, 
Applicants disagree with the Examiner's assertion that Applicants "acting as their own 
lexicographers" have given the term "small" in the phrase "small molecule" a meaning that is 
"repugnant to the usual meaning of that term" (Office Action, transition from page 8 to page 9). 
Moreover, Applicants respectfully submit that, because the term "small molecule" has been used 
in the Specification as a term of art (as is "hybridize" and "hybridization"), there is no 
requirement for Applicants to provide a detailed explanation of the phrase that distinguishes 
between the dictionary meaning of the term "small" in a general context, and the meaning of the 
term when used in a phrase ("small molecule") that is as a term of art. (See the Specification at 
page 43, lines 4-18; page 45, lines 4-9; page 76, lines 31-32.) 

Accordingly, Applicants respectfully submit that those of skill in the art would readily 
understand the meaning of the phrase "small molecule" as used in claim 22, and respectfully 
request reconsideration and withdrawal of the rejection of claim 19 under 35 U.S.C. § 1 12, 
second paragraph. 
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The Rejection Under 35 U.S.C. § 102 (e) 

Applicants respectfully traverse the rejection of claims 1-10, 13, 15-20 and 22-26 under 
35 U.S.C. § 102 (e) as allegedly being anticipated by Thompson et al. (U.S. Patent No. 
5,824,485; hereinafter "Thompson"). Claims 2, 13, 15, 18 and 26 have been cancelled without 
prejudice, thereby rendering the rejection moot as to these claims. Therefore, Applicants will 
address the rejection as to the currently presented claims. 

Applicants respectfully submit that the invention methods for identifying an enzyme of 
interest, as defined by amended claim 1, distinguish over the disclosure of Thompson by 
requiring: 

(a) obtaining a plurality of polynucleotides derived from a mixed population of 
organisms or more than one organism; 

(b) normalizing the plurality of polynucleotides to allow equal representation of all 
polynucleotides in the mixed population of organisms; 

(c) contacting a library containing clones of normalized polynucleotides from (b) 
with at least one oligonucleotide probe labeled with a detectable molecule, wherein the 
probe comprises at least a portion of a polynucleotide sequence encoding an enzyme of 
interest; 

(d) incubating the clones under such conditions and for such time as to allow 
hybridization of complementary sequences; 

(e) separating clones with an analyzer that detects the detectable molecule; 

(f) contacting the separated clones with a reporter system that comprises a substrate 
for the enzyme of interest; and 

(g) identifying clones capable of modulating expression or activity of the reporter 
system thereby identifying a polynucleotide that encodes the enzyme of interest. 

Applicants describe the effect of "normalizing" the sample of polynucleotides in the 
present application as follows: 
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[A] normalization of the environmental DNA present in these samples could allow more 
equal representation of the DNA from all of the species present in the original sample. This 
can dramatically increase the efficiency of finding interesting genes from minor constituents 
of the sample which may be under-represented by several orders of magnitude compared to 
the dominant species. (Specification, p. 20, lines 25-29) 

It is clear that due to use of the terms "minor constituents" and "several orders of magnitude 
compared to the dominant species" normalization refers to changing (i.e., equalizing) the abundance 
of DNA representative of the various species in the sample. The techniques of normalization 
disclosed by Applicants in the Specification produce "peaks representing the DNA from the 
organisms present in an environmental sample." Obtaining equal amounts of DNA from each 
peak thus equalizes the representation of all nucleic acid molecules in a sample, to form a 
normalized library, which is then screened for a desired bioactivity (Specification, Example 2, 
pages 106-7). Applicants also submit that this is a genomic method. To clarify the meaning of 
the term "normalizing" as used in claims 1 and 3, Applicants have amended both claims to recite 
"normalizing the polynucleotides ... to allow equal representation of all polynucleotides in the 
environmental sample" and "normalizing polynucleotides obtained from a mixed population of 
organisms to allow equal representation of all of the species present therein" respectively. It is also 
important to note that the methods of the present invention are directed to the abundance of DNA, 
not copy number of clones. 

To clarify the meaning of the term "normalizing" as used in claim 1, Applicants have 
amended the claim to recite "normalizing the plurality of polynucleotides to allow equal 
representation of all polynucleotides in the mixed population of organisms." It is also important 
to note that the methods of the present invention are directed to the abundance of DNA, not copy 
number of clones. 

Thompson is silent regarding "normalizing" environmental DNA to allow equal 
representation of polypeptides from all species present in a library assembled from a sample 
including a mixed population of organisms, as is required by Applicants' claims. Applicants 
disagree with the assertion in the office action stating that Thompson discloses "normalizing the 
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plurality of polynucleotides," and citing Thompson col. 32, lines 14-16. Applicants respectfully 
direct the Examiner's attention to the sentence in full-context. Thompson states, "More than one 
initial library may be pre-screened, and DNA from all the positive clones can be pooled and 
used for making the biased combinatorial library." (Col. 32, lines 13-16) Thompson goes further 
to state that, "Instead of using only the total pooled genomic DNA or cDNA of the donor 
organism(s), this approach will reduce the number of clones that need to be screened and 
increase the percentage of clones that will produce compounds of interest. The preselected 
fragments of DNA contain genes encoding partial or complete biosynthetic pathways, and may 
be preselected by hybridizing to an initial DNA library a plurality of probes prepared from 
known genes that may be related to or are involved in producing compounds of interest." (Col. 
31, line 65- Col. 32 line 7) Further, Thompson states: "The remaining DNA is thus biased 
toward coding regions that encode proteins involved in secondary metabolism" (Col. 32, lines 
54-56) {Emphasis added). 

As evidence of Thompson's alleged disclosure of normalizing, the Examiner asserts that 
Thompson is "'equalizing' slow growing members in a mixed population" and repairing damaged 
DNA to "equalize" the numbers of the damaged polynucleotides in the sample (Office Action, 
page 17). However, Thompson does not use the term "equalize" in connection with such 
activities and, in fact, Applicants disagree that such techniques would result in "increasing the 
representation of rare species in the sample." Thompson's repair of damaged DNA or cloning of 
uncultured organisms to avoid prejudice to slow growing species would, in both cases, tend to 
restore the natural distribution of polypeptides of various species in the sample because 
Thompson does not disclose that only rare polynucleotides would be need to be repaired or 
would be slow growing. For example, over-represented species are just as likely to have 
damaged DNA as underrepresented species. In short, Thompson fails to disclose any procedure 
by which the complexity of the DNA population obtained for the library is analyzed and treated 
in such a way that equal representation of all species in the mixed population is in the library. 

To reduce the number of clones that need to be screened, Thompson describes pre- 
selection of DNA fragments for the screening library using probes and refers to this process as 
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"biasing" a library. Such probes are described as being "prepared from known genes that may be 
related to or are involved in producing compounds of interest" (Thompson, Col 32, lines 6-7). 
However, rather than using the probes for screening (e.g., identifying molecules having a 
nucleotide sequence complementary to the probes) of a library of already "normalized" naturally 
occurring DNA molecules, as in Applicants' claim 1, Thompson uses the activity probe concept 
for pre-screening, pre-selecting and preparing "chimeric" and "biased" combinatorial expression 
libraries" (See Thompson, Section 5.1.6.) prior to screening. 

Applicants respectfully submit that the dictionary definition of "normalization" applied 
by the Examiner as "to cause to conform to a norm or standard" is improperly applied to the 
claims at issue because "normalization" in the context of the present claims is "a term of art" and 
those of skill in the art would understand "normalization" as a term of art and not as having the 
common dictionary meaning applied by the Examiner. 

For example, "normalizing" and the advantages of libraries that are "normalized" are 
described in U.S. Patent 6,174,673 (hereinafter "the '673 patent"), which is incorporated by 
reference into the present application: 

One embodiment for forming a normalized library from an environmental sample 
begins with the isolation of nucleic acid from the sample. This nucleic acid can 
then be fractionated prior to normalization to increase the chances of cloning DNA 
from minor species from the pool of organisms sampled. DNA can be fractionated 
using a density centrifugation technique, such as a cesium-chloride gradient. 
When an intercalating agent, such as bis-benzimide is employed to change the 
buoyant density of the nucleic acid, gradients will fractionate the DNA based on 
relative base content. Nucleic acid from multiple organisms can be separated in 
this manner, and this technique can be used to fractionate complex mixtures of 
genomes. This can be of particular value when working with complex 
environmental samples. . . This "normalization" approach reduces the redundancy 
of clones from abundant species and increases the representation of clones from 
rare species. These normalized libraries allow for greater screening efficiency 
resulting in the identification of cells encoding novel biological catalysts. 

The '673 patent, incorporated by reference in the instant application, also teaches: 
"single-stranded nucleic acid representing an enrichment of rare sequences is amplified using 
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techniques well known in the art, such as polymerase chain reaction (Barnes, 1994), and used to 
generate gene libraries. This procedure leads to the amplification of rare or low abundance 
nucleic acid molecules, which are then used to generate a gene library which can be screened for 
a desired bioactivity." 

In further support of the Applicants' arguments regarding "normalization" as a term of 
art, copies of Soares et al., "Construction and characterization of a normalized cDNA library," 
Proc. Natl. Acad. Sci. USA, Vol. 91, pp. 9228-9232, September 1994 Biochemistry (Exhibit A, 
attached hereto) and Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd edition, 
Cold Spring Harbor Laboratory Press, 1989, pp. 8.6-8.10 (Exhibit B, attached hereto), have been 
provided for the Examiner's convenience. 

In addition, Applicants question the point of the Examiner's statement that the use of 
"comprising" language in the claims does not limit the "order" in which the method steps are to 
be carrier out. Applicants respectfully submit that the claim language prescribes normalization 
of the polynucleotides prior to formation of the library. For example, claim 1 recites: 
"contacting a library containing clones of normalized polynucleotides from (b) with at least one 
oligonucleotide probe labeled with a detectable molecule, wherein the probe comprises at least a 
portion of a polynucleotide sequence encoding an enzyme of interest." Thus, the claim language 
already requires the polynucleotides to be normalized before the library of clones is prepared and 
"contacted". 

In addition, Thompson's omission of a "normalizing step" as the term is understood in 
the art makes the order of the steps in the claim irrelevant. To establish anticipation under 35 
U.S.C. § 102 (e) each and every element of the claimed invention must be disclosed by a single 
reference. As such, Thompson fails to disclose each and every element of claim 1 (and claims 
dependent thereon) as would be required to establish anticipation under 35 U.S.C. 102(b). 
Therefore, reconsideration and withdrawal of the rejection over Thompson are respectfully 
requested. 
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The Rejection under 35 U.S.C. § 103 

To establish a prima facie case of obviousness, three basic criteria must be met. First, 
there must be some suggestion or motivation, either in the references themselves or in the 
knowledge generally available to one of ordinary skill in the art, to modify the reference or to 
combine reference teachings. Second, there must be a reasonable expectation of success. 
Finally, the prior art reference (or references when combined) must teach or suggest all of the 
claim limitations. The teaching or suggestion to make the claimed combination and the 
reasonable expectation of success must both be found in the prior art, and not based on 
applicant's disclosure. In re Vaeck, 947 F.2d 488, 20 USPQ2d 1438 (Fed. Cir. 1991). The mere 
fact that references can be combined or modified does not render the resultant combination 
obvious unless the prior art also suggests the desirability of the combination. In re Mills, 916 
F.2d 680, 16 USPQ2d 1430 (Fed. Cir. 1990). 

Applicants respectfully traverse the rejection of claims 1-10, 13, 15-20, and 22-26 under 
35 U.S.C. § 103 as allegedly being unpatentable over Thompson (as above) and Miao et al, 
Biotechnology and Bioengineering (1993) 42:708-715, hereinafter "Miao". Claims 2, 13, 15, 18 
and 26 have been cancelled without prejudice, thereby rendering the rejection moot as to these 
claims. .Therefore, Applicants will address the rejection as to the currently presented claims. 

Applicants respectfully submit that the invention methods for identifying an enzyme of 
interest, as defined by amended claim 1, distinguish over the combined disclosures of Thompson 
and Miao by requiring: 

(a) obtaining a plurality of polynucleotides derived from a mixed population of 
organisms or more than one organism; 

(b) normalizing the plurality of polynucleotides to allow equal representation of all 
polynucleotides in the mixed population of organisms; 

(c) contacting a library containing clones of normalized polynucleotides from (b) 
with at least one oligonucleotide probe labeled with a detectable molecule, wherein the 
probe comprises at least a portion of a polynucleotide sequence encoding an enzyme of 
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interest; 

(d) incubating the clones under such conditions and for such time as to allow 
hybridization of complementary sequences; 

(e) separating clones with an analyzer that detects the detectable molecule; 

(f) contacting the separated clones with a reporter system that comprises a substrate 
for the enzyme of interest; and 

(g) identifying clones capable of modulating expression or activity of the reporter 
system thereby identifying a polynucleotide that encodes the enzyme of interest. 

The discussion above regarding the deficiencies of Thompson apply equally and are 
incorporated here. In addition, Applicants submit that Thompson fails to suggest the invention 
methods, as recited by amended claim 1, because Thompson fails to disclose or suggest 
normalizing the polynucleotides obtained from the mixed population to equalize the 
representation of all species prior to placement of the polynucleotides into clones and formation 
of the library to increase the chances of discovering an enzyme from an organism whose 
presence in the original sample is rare. Instead Thompson discusses repair of damaged DNA and 
methods for avoiding bias to slow growing members in a mixed population, either of which may as 
easily restore the original distribution of organisms in the sample as not, as Applicants have 
discussed above. Therefore, Applicants respectfully submit that Thompson would not motivate 
those of skill in the art to modify Thompson to arrive at the present invention methods because 
Thompson's comments regarding preparation of "activity biased" libraries would not motivate 
those of skill in the art to normalize the library by equalizing the representation of all organisms 
so that the chances of discovering an activity produced by a rare organism are increased. Indeed, 
Thompson's activity biasing of the library may well increase the chances that screening will not 
yield an enzyme, or other activity, produced by a rare organism because the commonest species 
in the sample may be selected by the activity probe and therefore have increased 
overrepresentation in Thompson's "biased" screening library as compared with the sample. 
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Applicants submit that the disclosure of Miao fails to remedy the deficiencies of 
Thompson under 35 U.S.C. § 103. Miao's disclosure pertains to use of C12FDG as a fluorescent 
substrate in FACS screening of single bacterial cells of one species (i.e., E. coli). Thus, like 
Thompson, Miao is completely silent regarding screening of a normalized library prepared by 
treating the polynucleotides obtained from a mixed population to equalize the representations of 
all species in the original sample. Indeed, since Miao's disclosure does not pertain to screening 
of a plurality of species at all, as would be inherent in a "mixed population", Applicants submit 
that the combined disclosures of Thompson and Miao would be insufficient to motivate those of 
skill in the art to modify Thompson so as to yield the present invention. 

In addition, even if those of skill in the art were motivated by the combined disclosures of 
Thompson and Miao to arrive at the invention methods, Applicants submit that the cited art 
would fail to provide the reasonable expectation of success that is required to show 
unpatentability under 35 U.S.C. § 103. Because both Thompson and Miao fail to discuss any 
technique by which a diverse library can be adjusted to equalize the representation of all 
polynucleotides obtained from a mixed population of organisms, those of skill in the art would 
not be justified in assuming success in the outcome of any technique that might be devised. 

Accordingly, Applicants respectfully submit that the combined disclosures of Thompson 
and Miao, including Miao's disclosure regarding rapid screening using C12FDG, are not 
sufficient to teach or suggest the invention methods of amended claim 1. Thus, Applicants 
respectfully submit that the pending claims are not prima facie obvious over Thompson, nor the 
combined disclosures of Thompson and Miao. Accordingly, reconsideration and withdrawal of 
the rejection under 35 U.S.C. § 103 are respectfully requested. 
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CONCLUSION 



In summary, in view of the amendments and for the reasons set forth herein, Applicants 
respectfully submit that claims 1, 3-12, 16, 17, 19, 20, and 22-25 clearly and patentably define 
the invention and allowance of the claims is respectfully requested. If the Examiner would like 
to discuss any issues raised in the Office Action, the Examiner is encouraged to call the 
undersigned so that a prompt disposition of this application can be achieved. 

Enclosed is Check No. 568727 totaling $885.00; to cover the fees for the Request for 
Continued Examination ($395) and Petition for Three-Month Extension of Time ($490). 
However, the Commissioner is hereby authorized to charge any additional other fees associated 
with the filing submitted herewith, or credit any overpayments to Deposit Account No. 50-1355. 
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ABSTRACT We have developed a simple procedure based 
on reassociation kinetics that can reduce effectively the high 
variation in abundance among the clones of a cDNA library 
that represent individual mRNA species. For this normaliza- 
tion, we used as a model system a library of human Infant brain 
cDNAs that were cloned dlrectionally Into a pbagemid vector 
and, thus, could be easily converted Into single-stranded cir- 
cles. After controlled primer extension to synthesize a short 
complementary strand on each circular template, melting and 
reannealing of the partial duplexes at relatively low C 0 f, and 
hydroxyapatite column chromatography, unreas&oclated cir- 
cles were recovered from the flow through fraction and clec- 
troporated Into bacteria, to propagate a normalized library 
without a requirement for subcloning steps. An evaluation of 
the extent of normalization has Indicated that, from an extreme 
range of abundance of 4 orders of magnitude In the original 
library, the frequency of occurrence of any clone examined in 
the normalized library was brought within the narrow range of 
only 1 order of magnitude. 



The mRNAs of a typical somatic cell arc distributed in three 
frequency classes (1, 2) that are presumably maintained in 
representative cDNA libraries. The classes at the two ex- 
tremes (ca. 10% and 40-45% of the total, respectively) 
include members occurring at vastly different relative fre- 
quencies. On average, the most prevalent class consists of 
about 10 mRNA species, each represented by 5000 copies per 
cell, whereas the class of high complexity comprises 15,000 
different species each represented by 1—15 copies only. Rare 
mRNAs are even more under represented in the brain, a 
tissue exhibiting an exceptionally high sequence complexity 
of transcripts (3-5). 

Although even the rarest mRNA sequence from any tissue 
is likely to be represented in a cDNA library of 10 7 recom- 
binants, its identification is very difficult (its frequency of 
occurrence may be as low as 2 x 10~* on average or even 10~ 7 
for complex tissues such as the brain). Thus, for a variety of 
purposes, it is advantageous to apply a normalization pro- 
cedure and bring the frequency of each clone in a cDNA 
library within a narrow range (generation of a perfectly 
equimolar cDNA library is practically impossible in our 
experience). Normalized cDNA libraries can facilitate posi- 
tional cloning projects aiming at the identification of disease 
genes, can increase the efficiency of subtractive hybridiza- 
tion procedures, and can significantly facilitate genomic 
research pursuing chromosomal assignment of expressed 
sequences and their localization in large fragments of cloned 
genomic DN A (exon mapping). Normalization makes feasi- 
ble the grid ding of cDNA libraries on filters at high density by 
reducing the number of clones to be arrayed (grid ding 10 7 
clones for lx coverage of a non-normalized library is not a 
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feasible task). Finally, by increasing the frequency of occur- 
rence of rare cDNA clones while decreasing simultaneously 
the percentage of abundant cDNAs, normalization can ex- 
pedite significantly the development of expressed sequence 
databases by random sequencing of cDNAs. 

Although cDNA library normalization could be achieved 
by saturation hybridization to genomic DNA (6), this ap- 
proach is impractical, since it would be extremely difficult to 
provide saturating amounts of the rarer cDNA species to the 
hybridization reaction. The alternative is the use of reasso- 
ciation kinetics: assuming that cDNA reannealing follows 
second-order kinetics, rarer species will anneal less rapidly 
and the remaining single-stranded fraction of cDNA will 
become progressively normalized during the course of the 
reaction (6-8). As we report here, wc have used this kinetic 
principle to develop a method for normalization of a direc- 
tionally cloned cDNA library that has significant advantages 
over two previously reported similar procedures (refs. 7 and 
8; see Results and Discussion), 

MATERIALS AND METHODS 

cDNA Library Construction. Poly(A) + RNA isolated from 
the entire brain of a female infant (72 days old), who died in 
consequence of spinal muscular atrophy, was used for con- 
struction of a cDNA library (IB) as described (9, 10). As a 
primer for first-strand cDNA synthesis, wc used the oligo- 
nucleotide 5'-AACTGGAAGAATTC GCGGCCGCA G- 
GAATw-3', which contains a Not I site (underlined). After 
ligation to //mdill adaptors, the cDNAs were digested with 
Not I and cloned dircctionaily into the HindUl and Not I sites 
of a phagemid vector (L-BA) constructed by modifying 
pEMBL-9(+) (11), L-BA carries an ampicillin-resistance 
gene, plasmid and filamentous phage (fl) origins of replica- 
tion, arid cloning sites (5' HindUl-BamHl-Not 1-EcoRl 3'). 
Superinfection of bacteria with the helper phage M13K07 (12) 
converts duplex plasmids into single- stranded DNA circles 
containing message-like strands of the cDNA inserts. 

Preparation of Single-Stranded Library DNA. Plasmid 
DNA from the IB library was electroporated into Escherichia 
colt DH5a F' bacteria, and the culture was grown under 
ampicillui selection at 37°C to an OD«» of 0.2, superinfected 
with a 20- fold excess of the helper phage M13K07, and 
harvested after 4 hr for preparation of single-stranded plas- 
mids, as described (12). To eliminate contaminating double- 
stranded, rcplicative form (RF) DNA, 20 j/g of the prepara- 
tion was digested with Pvull (which cleaves only duplex 
DNA molecules), extracted with phenol/chloroform, diluted 
by addition of 2 ml of loading buffer (0.12 M sodium phos- 
phate buffer, pH 6.8/10 mM EDTA/1% SDS), and purified 
by hydroxyapatite (HAP) chromatography at 60°C, using a 
column preequilibrated with the same buffer (1-ml bed vol- 
ume; 0.4 g of HAP). After a 6-mI wash with loading buffer, 
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this volume was combined with the flow through fraction, 
and the sample was extracted twice with water-saturated 
2-butanoi, once with dry 2-bulanoI, and once with water- 
saturated ether (3 volumes per extraction). The sample was 
desalted by passage through a Ncnsorb column (DuPont/ 
NEN) according to the manufacturer's specifications, con- 
centrated by ethanol precipitation, and electrophoresed in a 
low-melting agarose gel to remove helper phage DNA and 
any residual iRNA contaminant or otigoribonucleotides 
(breakdown products from the RNasc A digestion used 
during purification). The region of the gel containing the 
single-stranded library was excised and, after /J-agarase 
(New England Biolabs) digestion, the DNA was purified and 
ethanol-precipitated. 

cDNA Library Normalization. The IB c DNA library was 
normalized (see Fig. 1) in two consecutive" rounds to derive 
the normalized libraries 'NIB and 2 NIB, by using the follow- 
ing procedure. To synthesize a partial secood strand of about 
200 nt by limited extension, 9 pmol of the oligonucleotide 
primer 5*-GGCCGCAGGAATi 3 -3' was added to 4.5 pmol of 
single-stranded IB library DNA in a reaction mixture 

containing 30 mM Tris HQ (pH 7.5); 50 mM NaCI; 15 mM 
MgCl 2 ; 1 mM dithiothrcitol;0.1 mM dNTPs; 2.5 mM ddATP, 
ddCTP, and ddGTP; and a trace of la- M PJdCTP. The mixture 
was incubated for 5 min at 60°C and for 15 rain at 50°C, the 
temperature was lowered to 37°C, 75 units of Klenow DNA 
polymerase (United States Biochemical) was added, and the 
incubation was continued for 30 min. The reaction was 
terminated by addition of EDTA (20 mM), extracted with 
phenol/chloroform, diluted with 2 ml of HAP loading buffer 
containing 50 /ig of sonicated and denatured salmon sperm 
DNA carrier, and chromatographed on HAP, as described 
above. After washing, the partial duplex circles bound to 
HAP were eluted from the column with 6 ml of 0.4 M 
phosphate buffer, pH 6.8/10 mM EDTA/1% SDS. The 
concentration of phosphate in the eluate was lowered to 0.12 
M by addition of 14 ml of water containing 50 ^g of DNA 
carrier, and the chromatographic step was repeated. The final 
eluate was extracted and desalted as described above and the 
DNA was ethanol-precipitated. The pellet (112 ng) was 
dissolved in 2.5 fd of formamide and the sample was healed 
for 3 min at 80°C under a drop of mineral oil to dissociate the 
DNA strands. For an annealing reaction, the volume was 
brought to 5 /d by adding 0.5 /d of 0.1 M Tris-HCI, (pH 7.5) 
containing 0.1 M EDTA, 0.5 /U of 5 M NaCI, 1 pi (5 jxg) of 
(dThj-w. and 0.5 (0.5 /xg) of the extension primer. The last 
two ingredients were added to block stretches of adenine 
residues [representing the initial poly(A) tailsl and regions 
complementary to the oligonucleotide on the single- stranded 
DNA circles. The annealing mixture was incubated at 42X, 
and a 0.5-/d aliquot was withdrawn at 13 hr (calculated C 0 /, 
5.5). The unhybridized single- stranded circles (normalized 
library) were separated from the reassociated partial du- 
plexes by HAP chromatography and then recovered from the 
Dow through fraction as described above. Since we, and 
others (13), have observed that the electroporation efficiency 
of partially repaired circular molecules is increased by about 
100-fold in comparison with single-stranded circles, the nor- 
malized cDN A circles were converted to partial duplexes by 
primer extension using random hexamers and T7 DNA 
polymerase (Sequenasc version II; United States Biochem- 
ical), in a 10-20 jd reaction mixture containing 1 mM dNTPs. 
After addition of EDTA to 20 mM, phenol extraction, and 
ethanol precipitation, the cDNAs were dissolved in 10 mM 
Tris-HCI, pH 7.5/1 mM EDTA, and clectroporated into 
competent bacteria (DH10B; GIBCO/BRL). To determine 
the number of transformants, 1 hr after the electroporation a 
10- /J aliquot of the culture was plated on an LB agar plate 
containing 75 /xg/ml ampicillin (extrapolation from these data 
indicated that a normalized library of 2.5 x 10 6 colonies was 



obtained). Supercoiled plasmid DNA was then prepared 
(*NIB library) with a Qiagen plasmid kit (Qiagen, Chats- 
worth, CA). The same protocol was used for a second round 
of normalization (calculated C 0 r, 2.5) to derive the *NIB 
library (1.3 x 10 7 transformants) from a preparation of 'NIB 
single-stranded circles, except that the HAP purification step 
after primer extension to synthesize short complementary 
strands was omitted. 

Colony Hybridization. For screening, colonies were grown 
on duplicate nylon filters (GeneScreenF/wj; DuPont/NEN) 
that were processed as described (14) and hybridized at 42°C 
in 50% formamide/5x Denhardt's solution/0.75 M Nad/ 
0.15 M Tris-HCI. pH 7.5/0.1 M sodium phosphatc/0.1% 
sodium pyrophosphate/2% SDS containing sheared and de- 
natured salmon sperm DNA at 100 /xg/ml. Radioactive 
probes were prepared by random primed synthesis (15, 16) 
using the Prime-it II kit (Stratagcne). 

DNA Sequencing. Double-stranded plasmid DNA tem- 
plates were prepared by using the Wizard Minipreps DNA 
purification system (Promega) and sequenced from both ends 
by using the universal forward and reverse M13 fluorescent 
primers. Reactions were assembled on a Biomek 1000 work- 
station (Beckman) and then transferred to a thermocycler 
(Perkin-Elmer/Cctus) for cycle sequencing. Reaction prod- 
ucts were analyzed on an automated 370A DNA sequencer 
(Applied Biosystems). Nucleic acid and protein database 
searches were performed at the National Center for Biotech- 
nology Information server using the blast algorithm (17). 

RESULTS AND DISCUSSION 

Experimental Strategy. To develop a normalization proce- 
dure, shown schematically in Fig. 1, and at the same time 
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Pic 1. Diagram of the nor- 
. malization procedure. Single- 
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sulting partial duplexes are pu- 
rified from unprimed circles by 
HAP chromatography. Bound 
DNA is melted and reannealed 
to a relatively low C«f (sec text). 
The remaining single-stranded 
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increase the utility of the normalized model cDNA library, 
we first constructed a high-quality brain cDNA library (IB) 
that has the following features (10): the average size of a 
cDNA insert is 1.7 kb, often providing coding-region infor- 
mation by sequencing from the 5' end; the length of the 
segment representing the mRNA poly(A) tail is short, allow- 
ing an increase in the output of useful sequencing information 
from the 3' end; the frequency of nonrccombinant clones is 
extremely low (0.1%); and chimeric cDNAs have not been 
encountered, after single-pass sequencing of >2000 clones 
(10, 18). However, the latter analysis also demonstrated that 
13% of the clones in the IB library lacked poly(A) tails and 
were presumably derived from aberrant priming. 

To preserve the length of the cDNAs, avoid differential 
loss of sequences, and alleviate a need for subcloning steps 
after normalization, we excluded from our protocol the use of 
PCR and chose directional cloning into a phagemid vector. 
Such vectors have been previously used advantageously for 
cDNA library subtractions (13), although normalization was 
not attempted. This cloning regime readily provides single 
strands that can be used both for annealing and for direct 
propagation in bacteria. In control experiments (data not 
shown), we assessed the frequency of occurrence of abun- 
dant cDN As (encoding a- and /)- tubulin, elongation factor 1 a, 
and myelin basic protein) and demonstrated that, at least by 
this criterion, the representation of clones in the starting 
library remained unchanged after conversion into single- 
stranded circles. We also note that electrophoretic purifica- 
tion of the circles prior to use is necessary, to remove 
contaminating oligoribonucleotides (see Materials and Meth- 
ods), whose presence would result in undesirable internal 
priming events during the first step of our protocol. 



In contrast with our scheme, two other PCR-based nor- 
malization methods (7, 8) necessitate the use of subcloning 
steps. In one of these approaches (7), sheared cDNAs 
(0.2-0.4 kb) were ligated to a linker-primer, amplified by 
PCR, normalized kinetically, rearnplified, and finally cloned 
directionally in such a way that only 3 '-terminal sequences 
(almost exclusively 3' noncoding regions) were purposely 
preserved. The steps of the second scheme (8) were similar, 
except that the process started from cloned, randomly 
primed, and relatively short cDNAs, initially selected to 
minimize length-dependent differential PCR amplification. 
Thus, both coding and noncoding regions were represented in 
the final normalized library, but in pieces. 

While maintaining length and representation of mRNA 
regions, our protocol (Fig. 1) also addresses successfully the 
problem recognized in the first of the alternative approaches 
(7). It was considered that the 3' noncoding region is almost 
always unique to the transcript that it represents and is 
expected, therefore, to anneal only to its complement. In 
contrast, cross-hybridization of coding regions belonging to 
unequally represented members of otigo- or multigene fam- 
ilies could result in the elimination of rarer members from the 
population during the normalization process. This possibility 
is precluded in our method, which begins with the synthesis, 
from the 3' end of the cDNA, of a short complementary 
strand on the circular single-stranded cDNA template under 
controlled conditions, calibrated to yield strands with a 
narrow size distribution (200 ± 20 nt). Since the average 
length of 3' noncoding regions in brain mRN As is 750 nt (19), 
the vast majority of synthesized complementary strands 
participating in the annealing reaction should be devoid of 
coding region sequences. However, after this partial exten- 
sion step, purification of the products by HAP chromatog- 
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Fio. 2. Comparison of the frequencies of cDNA probes in the original (IB) and two normalized ON IB and 2 NIB) libraries. The indicated 
percentages of 28 cDNA sequences in the three libraries, tabulated in order of decreasing frequency in the LB hbrary, axe shown in the form 
of a histogram to visualize normalization. Frequencies were calculated from the number of positive colonics after hybridization of duplicate filters 
containing 500-180,000 colonics from each of the three cDNA libraries with the following 28 probes: 1. elongation factor la; 2, a-tubulin; 3, 
^-tubulin; 4, myelin bask protein; 5, aldolase; 6, 89-kDa heat shock protein; 7, radio; 8, seeretogranin; 9, microtubule-associaied protein; 11, 
vimentin; 13, a cDNA randomly picked from the 'NIB library similar to a mouse cysteine-rich intestinal protein ('NIB-2, GenBank accession 
nos. T09996 aod T09997); 19, a cDNA isolated from the 'NIB library homologous to the human endogenous retrovirus RTVLH2 (cDNA-20, 
accession nos. LI 3822 and LI 3823); 20, hi stone H2b.l; 23, a cDNA randomly picked from the »N1B library encoding the human polyposis {DPI 
gene) mRNA ('NIBOT, accession nos. T10266 and T10267); 27, a cDNA randomly picked from the »N1B library related to the human 
endogenous retrovirus ERV9 gene ('NIB- 114, accession nos. T 10086 and T10087); the remaining brain cDNAs are novel, and except for nos. 
10, 18, 21, and 25, they were randomly picked from the 'NIB library. 
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raphy is necessary to eliminate single strands of the IB library 
tacking poIy(A) tails that cannot participate in primed syn- 
thesis. We repeat the chromatographic step to reduce the 
background to negligible levels, since after the first passage 
through the HAP column about 0.1% of pure single strands 
bind nonspecifically. However, during the second round of 
normalization to derive the 7 NIB library, we omitted this step 
since we showed that 187 clones, which were picked ran- 
domly and sequenced from the 'NIB library (see below), all 
contained 3' poly(A) stretches. The remaining steps of our 
procedure entail melting and reannealing of the partial du- 
plexes, followed by purification of unre associated circles 
(normalized library) by HAP chromatography and electro- 
poration into bacteria (Fig. I). 

Characterization of Normalized cDNA Libraries. To eval- 
uate the extent of normalization achieved with our method, 
we compared the IB, ! N1B, and 2 NIB libraries by colony 
hybridization. For this analysis, we used 28 cDNA probes 
chosen to represent various frequencies of occurrence within 
a wide range (at least 4 orders of magnitude: 4.6% to 
<0.0006%) in the IB library (Fig. 2). However, an additional 
comparison of these results with independent theoretical 
estimates was necessary, to provide a further assessment of 
the degree of normalization, especially because the 'NIB 
library was derived after incubation to a relatively low Cot 
(5.5) during the reannealing step of our procedure. When 
relatively high Cof values were used in our initial attempts to 
normalize the IB library, we obtained unsatisfactory results 
(high background) that we attribute to technical problems 
inherent to the procedure. Nevertheless, a ree valuation of 
brain cDNA hybridization data (ref. 20; see Table 1) suggests 
that a relatively low Cot would suffice for our purpose, to 
bring the frequency of each library clone within a narrow 
range. 

For our calculations (Table 1), which should be regarded as 
rough but indicative estimates, we used a set of reliable 
hybridization data that are available only for mouse brain 
mRNAs (20), assuming that these measurements should not 
differ significantly among mammals (in all cases examined, 

Table 1. Estimates of frequencies of brain mRNAs 
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including humans, the average amount of RNA per brain cell 
and the number of cells per gram of tissue are practically the 
same; see, e.g., refs. 29 and 30). These calculations show that 
at Cot 5.5, of the three kinetic classes of mRNAs, the most 
abundant species are drastically diminished, while all fre- 
quencies are brought within the range of 1 order of magnitude 
(Table 1, compare columns b and h and columns f and i). Our 
experimental results (Fig. 2) show that the same range was 
achieved after a single round of normalization at this Cor 
(5.5). Thus, for all practical purposes, a single cycle is 
probably sufficient. Secondary normalization (calculated Cot 
= 2.5) to derive the 2 NIB library, although it did not result in 
a dramatic improvement, preserved the range of frequencies, 
while making the differences among individual sequences 
narrower overall (Fig. 2). Eleven of the 28 probes used in this 
analysis were derived from clones that were randomly picked 
from the *NIB library. The overall frequency fold variation 
was reduced from >7667 (4.6/<0.0006) in the IB library to 
133 (0.4/0.003) and 26 (0.1/0.01) in the 'NIB and *N1B 
libraries, respectively. However, some unexplained anoma- 
lies were also observed for a small minority of clones, whose 
already reduced frequencies in the 1 N1B library were some- 
what increased in the *NIB library (Fig. 2). 

To provide a further indication that normalization was 
successful, we sequenced from both ends 187 cDNA clones 
that were randomly picked from the ! NIB library (GenBank 
accession numbers T09994-T10011 and T10014-T10369). 
With the exception of 4 clones, which carried sequences 
corresponding to human mitochondrial 16S rRNA, all other 
cDN As of this pool were unique, in agreement with the 
expectation for a normalized library. To further investigate 
the effect of the normalization procedure on the subset of 
mitochondrial 16S rRNA clones (1.4%, 1%, and 0.4% in the 
IB, ! N1B and 2 NIB libraries, respectively), we compared the 
sequences of a number of 16S rRNA clones isolated from 
both the IB and 'NIB Libraries (kindly provided by M. Adams, 
Institute for Genomic Research and J. Sikela, University of 
Colorado). This analysis (data not shown) revealed that the 
165 rRNA clones isolated from *NIB did not correspond to 



Final 

Apfo Complexity , d No. of RNA Frequency per Component at frequency per 

Component* % b (pure)* kb species* s pecie*/ % *„■ Cot 5.5 , h % specie*, 1 % 

I , 16 10 96 36 0.44 6.15 0.7 0.02 

II 46 0.165 5,800 2.150 0.02 0.10 44.2 0.02 
111 38 0.0079 122,000 45,000 0.0008 0.0048 5SA 0.0012 

•The experimental data of pseudo- first-order hybridization kinetics of cDNA tracer, which was synthesized from mouse brain poly(A)+ 
polysome! mRNA and driven by its template (20), were solved by computer (unconstrained fit) into three kinetic components, using the 
EXCESS function of a least-squares curve-fitting program (21). 

b The fraction of total occupied by each of the components is shown, after a minor correction (at completion, practically all of the tracer had . 
reacted). These numbers (and all other numbers) in the table have been rounded. 

The computer-calculated pseudo-first-ordcr hybridization rate constant (*pf 0 ; M _, $ec~ l ) for each component was divided by each of the values 
in column b, to derive tpo (pure). 

The complexity (i.e., length of unique sequence) was calculated by considering the data from a calibration kinetic standard: cDNA synthesized 
from encephalomyocarditis virus RNA (complexity, 9.7 kb) that was driven by its template [kpt 0 (pure), 99J. Thus, each of the values in column 
d is the ratio (99 x 9.7)/£ pfo (pure). The complexity calculated for the rarest component (III) matches closely the values obtained from additional 
kinetic experiments using cDNA enriched for infrequent sequences (22, 23) and also the data of saturation experiments with single-copy genomic 
DNA tracer (24, 25). 

The number of different RNA species in each component was estimated from their complexities by assuming that the average size of brain 
mRNA is 2.7 kb (26). A conjecture (26) that rare brain mRNAs are longer than this value (by politically 5 kb on average) has not been supported 
by hard evidence. 

The initial average frequency of an individual mRNA species of each component in the entire population of mRNA molecules is the ratio of 
values in column b to those in column e. 

To assess the behavior of these kinetic components under the annealing conditions that we used for normalization (Cor, 5.5; length of 
complementary sequence in annealing strands, 0.2 kb), we first calculated the second-order rcassociation rate constant (*«,; M~ L sec -1 ) for each 
component. For this calculation, we considered that the Jt*> of a single and pure kinetic component with a complexity of 1 kb reacting at a 
fragment length of 0.2 kb U 590 (27, 28). Thus each value is 590 divided by the complexity in column d. 

To determine the percentage of the leftover of each component in the population at CqI 5.5, we first used the * w values in column g to calculate 
the fraction remaining single-stranded, according to the equation C/Co m 1/(1 + kCot) and then normalized the derived values to a total of 100%. 
The final average frequency of an individual mRNA species of each component is the ratio of values in column h to those in column e. 
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the predominant 16S rRNA species present in the IB library. 
Interestingly, in 17 of 19 16S rRNA clones sequenced from 
the IB library , the position of the A tract was the same as that 
present in the mature 16S rRNA. In contrast, all 8 clones 
sequenced from the J NIB library represented truncated ver- 
sions of the 16S rRNA, in which different lengths of the 3' 
terminal sequence were absent. Such truncated clones are 
under represented in the IB library (2 of 19). Therefore, their 
frequency was increased by normalization, as expected, 
while the 16S rRNA clones of the most prevalent form were 
reduced. It is likely that the shorter clones represent bona 
fide copies of naturally occurring truncated 16S rRNA mol- 
ecules (fef. 31-33; to be discussed elsewhere). 

Database searches (both blastn and blastx; ref. 17) 
revealed that of the 183 cDNAs examined, 152 (83%) were 
unknown (no hits), 15 (8.2%) corresponded to known human 
sequences, 5 (2.7%) were novel but related to known human 
sequences, 4 (2.2%) were homologous to mammalian se- 
quences, and 7 (3.8%) were homologous to known sequences 
from various nonmammalian organisms. 

In contrast to these results, when 1633 randomly picked 
clones from the non-normalized IB library, were sequenced 
mostly (88%) from the 5' end, the percentage of unknown 
sequences was significantly lower than in our case (63%), 
while about 30% of the clones were sequenced twice or more 
(up to 50) times (10). Similar results were obtained by 
sequencing 493 random IB clones exclusively from the 3' end 
(18). Of the initially abundant cDNAs, which were sequenced 
multiple times in both of these studies, those encoding 
elongation factor la, a-tubub'n, tubulin, myelin basic pro- 
tein, and yactin (corresponding to our probes 1-4 and 7; Fig. 
2) were absent from the pool of 187 clones that we examined. 
Moreover, only 15 of the unique 183 clones that we se- 
quenced from the *NIB library (8%) bad been previously 
identified in the collection of the sequenced 1633 IB clones. 

Eighteen of the unknown cDNAs that we sequenced (10% 
of the total clones) carried A/u repetitive elements (6 at the 5' 
end; 11 at the 3 ' end; and 1 at both ends). Thus, as previously 
observed (8), the frequency of cDNAs containing A fa repeats 
is not reduced by normalization. This phenomenon can be 
attributed to sequence heterogeneity among Alu family mem* 
bers, which are able to form imperfect hybrids that probably 
cannot bind to HAP. However, this is not a disadvantageous 
property, since it prevents elimination of rare A/a-carrying 
cDNAs from the population. 

To assess whether the normalization procedure had 
skewed the distribution of lengths favoring shorter cDNA 
clones, Southern blots of released inserts from the IB, 1 NIB, 
and *NI B plasmids were hybridized with several of the cDNA 
probes used in Fig. 2 individually. The results (not shown) 
demonstrated that the intensity of hybridization signals var- 
ied as expected, but the size of each hybridizing fragment 
remained the same. 

Note. Sasaki 1 t al (34) have described an alternative norrnalization 
procedure, in which a cDNA library was constructed following 
depiction of abundant mRNA species by sequential cycles of hy- 
bridization to matrix -bound cDNA. However, this procedure does 
not seem to be more advantageous than ours, while its actual 
practical potential remains to be assessed, as the putative normalized 
library was not adequately characterized. 
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Abundant mHJVAs 



Initially, cDNA cloning was used to obtain copies of abundant mRNAs such 
as those encoding globin, immunoglobulins, and ovalbumin. In these cases 
the RNA species of interest constitutes as much as 50-90% of the total' 
poly(A) cytoplasmic RNA isolated from specific types of differentiated cells 
Consequently, no further purification of the particular mRNA is required 
before double-stranded cDNA is synthesized and cloned. The desired cDNA 
clones can easily be identified by nucleic acid hybridization. The probes 
consist either of P-labeled single-stranded cDNA synthesized in vitro by 
reverse transcriptase, using as the template mRNA preparations that are 
rich in the sequences of interest, or of mRNA that has been partially 
fragmented by limited alkaline hydrolysis and end-labeled by phosphor- 
ylation. As a good approximation, the mRNA sequences of interest will be 
represented in both the probe and the cloned double-stranded cDNAs in 
proportion to their abundances in the original preparation of mRNA In cases 
such as globin, immunoglobulins, and ovalbumin, the chances are high that 
any colony hybridizing strongly to the probe will contain the desired DNA 
sequences. Although used extensively in the early days of cDNA cloning this 
method no longer finds wide application, since few systems remain in which 
interesting uncloned mRNAs represent a sufficiently high proportion of the 
starting population. 



Low-abundance mRNAs 



mRNAs that represent less than 0.5% of the total mRNA population of the 
cdl are classified as "low-abundance" or "rare" mRNAs. The isolation of 
cDNA clones for mRNAs of this type presents two major problems- (1) 
construction of a cDNA library whose size is sufficient to ensure that the 
clone of interest has a good chance of being represented and (2) identification 
and isolation of the clone(s) of interest. 



Methods of Enrichment 



d^a 031 mamma,ian ce » contains between 10,000 and 30,000 different 
mRNA sequences (Davidson 1976). Not all of these sequences are repre- 
sented equally m the steady^state population of mRNA molecules Instead 
the proportional representation of each sequence depends on its rate of 
synthesis and half-life: Genes that are actively transcribed into stable 
mKIWAs will make a greater contribution to the pool of mRNA molecules than 
ffooiVu *** transcribed sluggishly into less stable mRNAs. Williams 
(1981) has determined the number of clones necessary to construct a com- 
f ™ 1,brarv fron > a human fibroblast cell that contains approximately 
12,000 different mRNA sequences. Low-abundance mRNAs «14 copies/cell) 
constitute approximately 30% of the mRNA, and there are about 11 000 
different mRNAs belonging to this class. The minimum number of cDNA 
clones required to obtain a complete representation of mRNAs of this class is 
therefore 11,000/0.30 = 37,000. Of course, because of sampling variation 
and/or preferential cloning of certain sequences, a much larger number of 
recombinants must be obtained to increase the chances that any given clone 
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will be represented in the library. The number of clones required to achieve a 
given probability that a low-abundance mRNA will be present in a cDNA 
library is 



ln(l-l/n) 

where N = the number of clones required, P = the probability desired (usually 
0.99), and l/n = the fractional proportion of the total mRNA that is repre- 
sented by a single type of rare mRNA. 

Therefore, to achieve a 99% probability of obtaining a cDNA clone of an 
mRNA present in human fibroblasts at a frequency of approximately 14 
molecules/cell: 

P = 0.99 

1/^ = 1/37,000 

N = 170,000 

Unfortunately, many mRNAs of interest are present at even lower levels (1 
molecule/cell is not unusual [Toole et al. 1984; Wood et al. 1984]). Further- 
more, it is often necessary to clone cDNAs from populations of mRNAs 
isolated from tissues that consist of several cell types. In such cases, the 
frequency at which the sequences of interest are represented in the initial 
preparation of mRNA may be reduced still further, and it then becomes 
necessary to construct and screen libraries that contain several million 
independent cDNA clones. During the last few years, the efficiency with 
which cDNA can be synthesized and cloned has increased to the point where 
cDNA libraries of this size can be generated routinely from 10 fig or less of 
poly(A) + mRNA. In principle, there is no a priori reason why even the most 
difficult cDNA clones— those corresponding to a very rare mRNA of large 
size — cannot be identified in such comprehensive libraries. However, screen- 
ing large numbers of cDNA clones is both tedious and expensive. Methods 
have therefore been devised to enrich either the starting population of mRNA 
molecules or double-stranded cDNA synthesized from it for sequences of 
interest. Enrichment allows the size of the cDNA library to be reduced and de- 
creases the cost and labor involved in screening for the desired cDNA clones. 

It is difficult to offer specific guidelines regarding the circumstances that 
require enrichment procedures. As a rule of thumb, fractionation of mRNA is 
probably unnecessary if the cDNA of interest is expected to be present at a 
frequency ^ 1 in 10 6 in a library of cDNA clones synthesized from unfraction- 
ated mRNA. Enrichment becomes more attractive as the number of clones to 
be screened increases above one million. When designing a scheme to clone a 
specific cDNA, it is therefore important to know the approximate frequency 
with which the mRNA of interest occurs in the bulk, unfractionated popula- 
tion of mRNA molecules. In the absence of nucleic acid probes, an indirect 
method must be used to measure this frequency. Usually, the mRNA 
preparation is translated in a cell-free system and the total amount of 
radioactivity incorporated into protein is measured. The polypeptide of 
interest is then immunoprecipitated and identified by electrophoresis through 
an SDS-polyacrylamide gel. The amount of radioactivity in the excised band 
is then measured and used to calculate the proportion of the total counts that 
have been incorporated into the protein of interest. This proportion is taken 
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as a measure of the frequency with which the mRNA occurs in the bulk 
population. Despite its obvious limitations, this method usually yields 
estimates that are sufficiently reliable to allow rational schemes for cDNA 
cloning to be devised. 

Clearly, fractionation works best for mRNAs that are much larger or 
smaller in size than the bulk mRNA of the cell. The modal size of the mRNA 
population extracted from most types of mammalian cells is approximately 
1.8 kb, and mRNAs smaller in size than 700 bases or larger than 4 kb can be 
enriched at least tenfold by a single round of density gradient centrifugation 
carried out under denaturing conditions. However, it is important to re- 
member that it is not possible to predict with certainty the size of an mRNA 
from the size of a protein for which it codes. There is considerable variation 
in the sizes of the untranslated regions of mRNAs (particularly the 3' 
untranslated regions); many proteins purified from cells are cleavage prod- 
ucts of larger precursors and many undergo extensive posttranslatibnal 
modification. However, the size of the unmodified polypeptide chain provides 
a minimal estimate of the size of the mRNA: 10,000 daltons of an average 
polypeptide is encoded by approximately 280 bases of mRNA. 

FRACTIONATION OF mRNA BY SIZE 

The simplest method to enrich preparations of mRNA for sequences of 
interest is to fractionate them according to size. Electrophoresis through 
agarose gels gives the best separation of molecules of mRNA of different 
sizes, but the recovery of RNA from gel slices is generally poor. Sedimenta- 
tion through sucrose gradients formed in nondenaturing solvents results in 
good recovery, but the presence of secondary structure in the RNA often 
confounds effective fractionation. The method of choice, therefore, is sucrose 
gradient centrifugation in the presence of an agent, such as methylmercuric 
hydroxide, that denatures secondary structure in RNA (Schweinfest et al. 
1982) (for experimental protocol, see Chapter 7, page 7.35). Each fraction is 
then assayed for the presence of the mRNA that codes for the relevant 
polypeptide. Typically, an aliquot of the RNA in each fraction is translated in 
a cell-free system and the resulting polypeptides are analyzed by immuno- 
precipitation and electrophoresis through polyacrylamide gels. Alternatively, 
aliquots are injected into Xenopus oocytes (for review, see Melton 1987) and 
the resulting products are assayed either for biological activity or by immuno- 
precipitation and gel electrophoresis. The fraction that directs the synthesis 
of the greatest amount of the polypeptide product is then used as the starting 
material for construction of a cDNA library. 

FRACTIONATION OF cDNA 

Until a few years ago, fractionation of mRNA was the method of choice for 
cloning of mRNAs that code for large proteins (e.g., rat skeletal muscle 
tropomyosin [Medford et al. 1980] and chick creatine kinase [Schweinfest et 
al. 1982]). However, as methods for the synthesis of cDNA have improved, 
fractionation of double-stranded cDNA has become a more practical alterna- 
tive, and there are now many examples of extremely large cDNAs that have 
been cloned by fractionating cDNA rather than the mRNA from which it was 
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copied (e.g., human factor VIII:C [Toole et al. 1984; Wood et al. 1984] and 
human sucrase-isomaltase [Hunziker et al. 1986]). Fractionation of cDNA 
has major advantages: DNA is less susceptible than mRNA to degradation by 
contaminating nucleases; it can be fractionated more accurately by elec- 
. trophoresis through agarose gels; and, finally, since the fractionation can be 
carried out at a late stage during the cDNA cloning protocol, the chances of 
subsequent mishaps are reduced and the probability of obtaining a full-length 
clone of cDNA is increased. Fractionation is usually carried out after all of 
the enzymatic reactions involved in cDNA synthesis have been completed and 
just before the cDNA is inserted into a vector. In the detailed protocol 
described later in this chapter, fractionation is carried out after synthetic 
linkers, added to the termini of double-stranded cDNA, have been digested 
with a restriction enzyme. The cDNA is fractionated by electrophoresis 
through an agarose gel of appropriate porosity, using markers whose sizes 
are known accurately. Molecules of the desired size are recovered and 
inserted into the vector. 

IMMUNOLOGICAL PURIFICATION OF POLYSOMES 

An alternative method of enrichment involves the use of antibodies to purify 
polysomes that are synthesizing the polypeptide of interest. The technique 
described originally (Palacios et al. 1972; Schechter 1973), which involved the 
immunoprecipitation of polysomes, worked well for mRNAs encoding abun- 
dantly synthesized proteins such as albumin and immunoglobulin, although 
attempts to apply the method to mRNAs of lesser abundance were generally 
disappointing. However, the more recent use of immunoaffinity columns 
(Schutz et al. 1977) and protein A-Sepharose columns (Shapiro and Young 
1981) has led to a resurgence of the technique. For example, Korman et al. 
(1982) used a monoclonal antibody directed against the heavy chain of the 
human HLA-DR histocompatibility antigen to bind polysomes synthesizing the 
nascent protein to protein A-Sepharose columns. The polysomes were then 
dissociated with EDTA and the mRNA isolated by oligo(dT) chromatography. 
The immunoaffinity-purified mRNA, which represented only 0.01-0.05% of 
the total mRNA, was used both to prepare cDNA probes and to construct 
cDNA clones. Using similar methods with polyclonal antisera, Kraus and 
Rosenberg (1982) obtained a 6300-fold purification of the mRNA that codes 
for rat liver cystathionine ^-synthase and Russell et al. (1983) isolated cDNA 
clones for the bovine low-derisity-lipoprotein receptor, whose mRNA is 
present at approximately 80 copies per cell in bovine adrenal cells. 

Although a powerful technique, immunoaffinity purification of polysomes 
cannot be applied universally. First, it clearly will not work unless a reliable 
source of material is available from which to isolate functional polysomes. 
This is not always possible, especially when the starting material is a tissue 
or organ that is not commonly available. Second, it has not yet been shown to 
work for mRNAs that are extremely rare (1 molecule/cell or less). Further- 
more, the success of the method depends entirely on the specificity, avidity, 
and type of the particular antibody, and it is not always possible to translate 
the results obtained with one antibody directly to another. Finally, the 
method requires the use of relatively large quantities of antibody. Partly 
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because of these difficulties, immunoaffinity purification of polysomes has 
been superseded by development of cDNA cloning vectors (e:g., Agtll and 
Agtl8-23) that allow the direct isolation of cDNA clones that code for specific 
antigens. Whether or not the method is used extensively in the future will 
depend on improving its sensitivity to the point where it provides significant 
enrichment of polysomes carrying extremely rare mRNAs. 
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